Streaming statistical models via Merge & Reduce
نویسندگان
چکیده
منابع مشابه
Efficient SPARQL Query Processing via Map-Reduce-Merge
The move towards a “semantic web” is driving the need for efficient querying ability over large datasets consisting of statements about web resources. RDF is a set of standards for describing and modeling data and is the backbone of the semantic web technologies. RDF datasets can be very large, and often are subject to complex queries with the intent of extracting and infering otherwise unseen ...
متن کاملStreaming Large Language Models for Statistical Machine Translation
This paper presents an efficient low-memory method for constructing high-order approximate n-gram frequency counts. The method is based on a deterministic streaming algorithm which efficiently computes approximate frequency counts over a stream of data while employing a small memory footprint. We show that this method easily scales to billion-word monolingual corpora using a conventional (4 GB ...
متن کاملEmbedded Software Streaming via Block Streaming
To my mother iii ACKNOWLEDGMENTS I would like to express my sincere gratitude and appreciation to everyone who helped make this dissertation possible. First and foremost, I would like to thank my advisor, Professor Vincent J. Mooney III, for his patience and guidance during my graduate study at Georgia Tech. With his knowledge and experience, he has guided me to achieve my research objective. I...
متن کاملRandom Projections for k-Means: Maintaining Coresets Beyond Merge & Reduce
We give a new construction for a small space summary satisfying the coreset guarantee of a data set with respect to the k-means objective function. The number of points required in an offline construction is in Õ(kǫ−2 min(d, kǫ−2)) which is minimal among all available constructions. Aside from two constructions with exponential dependence on the dimension, all known coresets are maintained in d...
متن کاملA Method to Reduce Effects of Packet Loss in Video Streaming Using Multiple Description Coding
Multiple description (MD) coding has evolved as a promising technique for promoting error resiliency of multimedia system in real-time application programs over error-prone communicational channels. Although multiple description lattice vector quantization (MDCLVQ) is an efficient method for transmitting reliable data in the context of potential error channels, this method doesn’t consider disc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Data Science and Analytics
سال: 2020
ISSN: 2364-415X,2364-4168
DOI: 10.1007/s41060-020-00226-0